The quality of knowledge retrieval is crucial in knowledge-intensive conversations. Two common strategies to improve the retrieval quality are finetuning the retriever or generating a self-contained query, while they encounter heavy burdens on expensive computation and elaborate annotations. In this paper, we propose an unsupervised query enhanced approach for knowledge-intensive conversations, namely QKConv. There are three modules in QKConv: a query generator, an off-the-shelf knowledge selector, and a response generator. Without extra supervision, the end-to-end joint training of QKConv explores multiple candidate queries and utilizes corresponding selected knowledge to yield the target response. To evaluate the effectiveness of the proposed method, we conducted comprehensive experiments on conversational question-answering, task-oriented dialogue, and knowledge-grounded conversation. Experimental results demonstrate that QKConv achieves state-of-the-art performance compared to unsupervised methods and competitive performance compared to supervised methods.
translated by 谷歌翻译
Software engineers working with the same programming language (PL) may speak different natural languages (NLs) and vice versa, erecting huge barriers to communication and working efficiency. Recent studies have demonstrated the effectiveness of generative pre-training in computer programs, yet they are always English-centric. In this work, we step towards bridging the gap between multilingual NLs and multilingual PLs for large language models (LLMs). We release ERNIE-Code, a unified pre-trained language model for 116 NLs and 6 PLs. We employ two methods for universal cross-lingual pre-training: span-corruption language modeling that learns patterns from monolingual NL or PL; and pivot-based translation language modeling that relies on parallel data of many NLs and PLs. Extensive results show that ERNIE-Code outperforms previous multilingual LLMs for PL or NL across a wide range of end tasks of code intelligence, including multilingual code-to-text, text-to-code, code-to-code, and text-to-text generation. We further show its advantage of zero-shot prompting on multilingual code summarization and text-to-text translation. We will make our code and pre-trained models publicly available.
translated by 谷歌翻译
Recent cross-lingual cross-modal works attempt to extend Vision-Language Pre-training (VLP) models to non-English inputs and achieve impressive performance. However, these models focus only on understanding tasks utilizing encoder-only architecture. In this paper, we propose ERNIE-UniX2, a unified cross-lingual cross-modal pre-training framework for both generation and understanding tasks. ERNIE-UniX2 integrates multiple pre-training paradigms (e.g., contrastive learning and language modeling) based on encoder-decoder architecture and attempts to learn a better joint representation across languages and modalities. Furthermore, ERNIE-UniX2 can be seamlessly fine-tuned for varieties of generation and understanding downstream tasks. Pre-trained on both multilingual text-only and image-text datasets, ERNIE-UniX2 achieves SOTA results on various cross-lingual cross-modal generation and understanding tasks such as multimodal machine translation and multilingual visual question answering.
translated by 谷歌翻译
Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes given a speech example and its transcription. By learning to reconstruct the masked parts of the input in different languages, our model shows great improvements over speaker-embedding-based multi-speaker TTS methods. Moreover, our framework is end-to-end for both the training and the inference without any finetuning effort. In cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing tasks, our experiments show that our model outperforms speaker-embedding-based multi-speaker TTS methods. The code and model are publicly available at PaddleSpeech.
translated by 谷歌翻译
作为一线诊断成像方式,射线照相在早期检测髋关节发育不良(DDH)中起着至关重要的作用。在临床上,DDH的诊断依赖于手动测量和对骨盆X光片不同解剖特征的主观评估。这个过程效率低下且容易出错,需要多年的临床经验。在这项研究中,我们提出了一个基于深度学习的系统,该系统自动从X光片中自动检测14个关键点,测量三个解剖学角度(中心边缘,T \“ Onnis和Sharp Angles),并将DDH臀部分类为I-IV级别I-IV级别此外,提出了一种新型数据驱动的评分系统,以定量地整合DDH诊断的信息。提出的键盘检测模型达到了平均值(95%置信区间[CI])的平均精度为0.807) (0.804-0.810。 )和0.953(0.947-0.960),它们明显高于经验丰富的骨科医生(p <0.0001)。此外,使用拟议的得分获得的平均(95%CI)测试诊断协议(Cohen's Kappa)系统为0.84(0.83-0.85),whi CH显着高于从诊断标准获得的单个角度(0.76 [0.75-0.77])和骨科医生(0.71 [0.63-0.79])的CH。据我们所知,这是通过利用深度学习关键点检测和整合不同解剖学测量值的首次进行客观DDH诊断的研究,这可以为临床决策提供可靠且可解释的支持。
translated by 谷歌翻译
渗透是气候,物理,材料科学,流行病学,金融等重要主题。用机器学习方法预测渗透阈值仍然具有挑战性。在本文中,我们构建了一个强大的图形卷积神经网络,以监督和无监督的方式研究渗透。从监督的学习角度,图形卷积神经网络同时并正确训练不同晶格类型的数据,例如正方形和三角形晶格。对于无监督的视角,将图形卷积神经网络和混乱方法结合在一起,可以通过“ W”形性能获得渗透阈值。这项工作的发现打开了建立一个更通用的框架的可能性,该框架可以探究与渗透相关的现象。
translated by 谷歌翻译
用于图像文本生成任务的传统方法主要是分别解决自然双向生成任务,专注于设计任务特定的框架以提高所生成的样本的质量和保真度。最近,Vision-Language预训练模型大大提高了图像到文本生成任务的性能,但仍未开发出用于文本到图像综合任务的大规模预训练模型。在本文中,我们提出了一个具有变压器模型的双向图像文本生成的统一生成的预训练框架的Ernie-Vi​​lg。基于图像量化模型,我们将图像生成和文本生成标准为在文本/图像输入上调节的自回归生成任务。双向图像文本生成建模简化了视觉和语言的语义对齐。对于文本到图像生成过程,我们进一步提出了端到端的训练方法,共同学习视觉序列发生器和图像重建。为了探讨双向文本图像生成的大规模预培训景观,我们在大规模数据集中培训了100亿参数的Ernie-Vi​​lg模型,以145百万(中文)图像 - 文本对实现了达到的状态 - 文本到图像和图像到文本任务的最佳性能,以便在MS-Coco上获取7.9的FID,用于文本到图像合成以及用于图像标题的Coco-CN和AIC-ICC的最佳结果。
translated by 谷歌翻译
预先接受的语言模型实现了最先进的导致各种自然语言处理(NLP)任务。 GPT-3表明,缩放预先训练的语言模型可以进一步利用它们的巨大潜力。最近提出了一个名为Ernie 3.0的统一框架,以预先培训大型知识增强型号,并培训了具有10亿参数的模型。 Ernie 3.0在各种NLP任务上表现出最先进的模型。为了探讨缩放的表现,我们培养了百卢比的3.0泰坦参数型号,在PaddlePaddle平台上有高达260亿参数的泰坦。此外,我们设计了一种自我监督的对抗性损失和可控语言建模损失,以使ERNIE 3.0 TITAN产生可信和可控的文本。为了减少计算开销和碳排放,我们向Ernie 3.0泰坦提出了一个在线蒸馏框架,教师模型将同时教授学生和培训。埃塞尼3.0泰坦是迄今为止最大的中国密集预训练模型。经验结果表明,Ernie 3.0泰坦在68个NLP数据集中优于最先进的模型。
translated by 谷歌翻译
面向任务导向的对话系统已经受到获得大规模和高质量的注释对话的困难困扰。此外,大多数公开的数据集仅包括书面对话,这不足以反映实际口头对话系统中的实际人类行为。在本文中,我们提出了面向任务的对话数据增强(TOD-DA),这是一种新型模型 - 不可知的数据增强范例,以提高面向任务对话建模的鲁棒性。 TOD-DA由两个模块组成:1)对话丰富,以扩展关于易于执行数据稀疏性的任务对话的培训数据,用于宽松数据稀疏性和2)口语对话模拟器,以模仿各种粒度的口语样式表达和语音识别错误,以弥合书面之间的差距和口头对话。通过这样的设计,我们的方法在DSTC10 Track2的两个任务中排名第一,这是针对口语对话的任务对话建模的基准,展示了我们提出的TOD-DA的优势和有效性。
translated by 谷歌翻译
人工神经网络(ANN)通常仅限于通过学习一组静态参数来完成预定的任务。相比之下,生物神经网络(BNN)可以通过根据其观察值不断更新其连接权重来适应各种新任务,这与学习有效学习规则的范式相符,例如静态参数,例如元参数。在广泛的生物学启发的学习规则中,Hebbian可塑性使用本地信号更新神经网络权重,而无需明确的目标功能指导,并密切模拟了BNN的学习。然而,使用大规模元参数的典型塑料环境违反了基因组瓶颈的性质,并使概括能力恶化。这项工作提出了一个新的学习范式,将这些依赖连接的可塑性规则分解为神经元依赖性规则,因此可容纳$ o(n^2)$可学习参数,只有$ o(n)$ meta-parameters。分解的可塑性以及不同类型的神经调节术都适用于从头开始的递归神经网络,以适应不同的任务。我们的算法在挑战随机的2D迷宫环境中进行了测试,在这些环境中,代理商必须利用过去的经验来提高其性能,而无需任何明确的客观功能和人类干预,即通过互动来学习。结果表明,满足基因组瓶颈的规则比以前的基于模型和基于可塑性的元学习更好地适应了分布式任务。
translated by 谷歌翻译